Fault Diagnosis of Rolling Bearing Based on a Priority Elimination Method

Aiming at the fault diagnosis accuracy of rolling bearings is not high enough, and unknown faults cannot be correctly identified. A priority elimination (PE) method is proposed in this paper. First, the priority diagnosis sequence of faults was determined by comparing the ratios of the inter-class distance to the intra-class distance for all faults. Then, the model training and fault diagnosis were carried out in order of the priority sequence, and the samples of the fault that had been identified were eliminated from the data set until all faults were diagnosed. For the diagnosis model, the stacked sparse auto-encoder network (SSAE) was selected to extract the features of the vibration signal. The extreme gradient boosting algorithm (XGBoost) was chosen to identify the fault type. Finally, the method was tested and verified by experimental data and compared with classical algorithms. Research results indicate the following: (1) with the addition of PE based on SSAE-XGBoost, the fault diagnosis accuracy can be improved from 96.3% to 99.27%, which is higher than other methods; (2) for the test set with the samples of unknown faults, the diagnosis accuracy of SSAE-XGBoost with PE can reach 92.34%, which is nearly 6% higher than that without PE and is also obviously higher than other classical fault diagnosis methods with or without PE. The PE method can not only improve the diagnosis accuracy of faults but also identify unknown faults, which provides a new method and way for fault diagnosis.


Introduction
As an essential component of rotating machinery, the rolling bearing is widely used in automobile manufacturing, aerospace, numerical control machines, and various electromechanical equipment [1]. Once the bearing breaks down, it may cause damage to the equipment and even cause significant economic losses and casualties. According to the research by Henriquez P et al. [2], bearings are the most easily damaged components in electromechanical equipment, whose failure rate accounts for 41% of total failures. Therefore, it is of great significance to study fault diagnosis methods of rolling bearings to ensure the stable and reliable operation of electromechanical equipment [3].
Currently, the methods commonly used for bearing fault diagnosis can be divided into two stages: feature extraction and fault identification [4].
In terms of feature extraction, the traditional method is to use the signal processing method to extract the fault features from the vibration signal in the time domain, frequency domain, or time-frequency domain [5]. It mainly includes wavelet transform (WT) [6], empirical mode decomposition (EMD) [7], short-time Fourier transform (STFT) [8], etc. In recent years, deep learning theory has become more and more mature, and computers' computing power has been significantly improved. Many deep learnings, such as the convolutional neural network (CNN), the deep belief network (DBN), the auto-encoder (AE), and the long short-term memory network (LSTM) [9], etc., have been applied in fault diagnosis due to the fact that they can deal with complex and high-dimensional problems in massive data that cannot be solved by shallow learning [10], and they have many advantages including high efficiency, plasticity, and universality. CNN is a multi-layer neural network model in supervised learning, mainly composed of convolutional, pooling, and fully connected layers. It realizes fault diagnosis by extracting local features of the vibration signal layer by layer. Hoang et al. [11] proposed a method for bearing fault diagnosis based on the deep structure of CNN. A direct connection based on the CNN (DC-CNN) method was studied by Kim [12]. It can significantly improve training efficiency and diagnosis performance. Additionally, a rolling-element bearing fault diagnosis method using an improved 2D LeNet-5 network has been proposed to satisfy the requirements of fault diagnosis of rolling bearings [13]. DBN is composed of multi-layer restricted Boltzmann machines (RBM) and a layer of backpropagation (BP) neural networks [14]. An adaptive DBN model of fault diagnosis based on the Nesterov moment (NM) optimization was researched to extract deep representative features [15]. Gao et al. [16] proposed a new optimized adaptive DBN with high diagnostic accuracy and good convergence to analyze the vibration signal of rolling bearings. AE is an unsupervised learning method consisting of three layers of neurons. It has been widely used in the fault diagnosis of equipment. Shao et al. [17] proposed a novel fault diagnosis method for rolling bearings based on the deep wavelet auto-encoder (DWAE) and extreme learning machine (ELM). An intelligent fault diagnosis method of rotating machinery based on a semi-supervised deep sparse auto-encoder (SSDSAE) was presented by Zhao et al. [18]. Huang et al. [19] developed an innovative deep learning-based model, namely, memory residual regression auto-encoder (MRRAE), to improve the accuracy of anomaly detection in bearing condition monitoring recently.
In the aspect of fault identification, commonly used methods are the support vector machine (SVM), artificial neural network (ANN), ensemble learning (EL), etc. SVM is a machine learning method based on statistical learning theory. It has been widely applied due to its high accuracy and good generalization ability [20]. Zhu et al. [21] input the fault feature vectors into an SVM classifier to automatically accomplish bearing fault identification. The structure of ANNs is often determined empirically, and their recognition accuracy is related to the number of training samples. BP is the most commonly used algorithm in ANN. Song et al. [22] improved the traditional BP neural network and increased the diagnosis efficiency of BP neural networks. To learn and distinguish features adaptively from the original data, a multiscale local feature learning method based on the BP neural network (BPNN) for rolling bearings' fault diagnosis was proposed by J. Li [23]. Extreme gradient boosting (XGBoost), an ensemble learning method, has been proven to have high accuracy and fast processing time [24]. In reference [25], the XGBoost was adopted as the final classifier, and good results were achieved.
The fault diagnosis methods of rolling bearings described above can identify all fault types in the test set, but all the fault types they identified were trained and labeled in advance. If there are new unknown faults in the test set, these methods will identify them as the most similar faults and fail to identify them as belonging to new fault types. It will not only reduce the accuracy of fault diagnosis but may also cause serious harm to the electromechanical equipment.
Considering the above, the purpose of this paper is to present a novel priority elimination (PE) method combined with the stacked sparse auto-encoder network (SSAE) and the extreme gradient boosting algorithm (XGBoost) for the fault diagnosis of rolling bearings, to improve the diagnosis accuracy and correctly identify new unknown faults. The PE is used to determine the diagnosis sequence, SSAE is applied to extract fault features, and XGBoost is used to identify fault types. This paper is organized as follows. Section 2 expounds on the theoretical methods of the SSAE, XGBoost, and PE. The diagnosis procedure based on PE is described in detail in Section 3. Section 4 verifies the effectiveness of the proposed method using the experimental data of rolling bearing faults of Case Western Reserve University as an example. Conclusions are enclosed in Section 5.

Priority Elimination Method
The PE method is mainly used to determine the diagnosis sequence of different fault types. Its operation process is described as follows: Step 1. Adopt t-distributed stochastic neighbor embedding (t-SNE) [26] to reduce vibration signal features from multi-dimension to two-dimension.
Step 2. Calculate the intra-class distance between different samples of each fault type to form an intra-class distance matrix S w .
Suppose G p is a fault type in the training set, which contains n p samples. Then, its intra-class distance D is where X (i) k and X (i) l are different samples in G p , and d() is the Euclidean distance between different samples.
The calculated intra-class distances of all fault types are taken as the diagonal elements to form the intra-class distance matrix S w .
Step 3. Calculate the inter-class distance between different samples in different fault types to form an inter-class distance matrix S b .
Assume that G p and G q are two different fault types in the training set, which contains n p and n q samples, respectively. The distance between these two fault types can be expressed in various ways, such as the nearest distance method, which is defined as where d ij denotes the distance between the sample X i in G p and sample X j in G q . The nearest distance method defines the shortest distance between two fault types as the inter-class distance. On the contrary, if the maximum distance between two fault types is defined as the inter-class distance, it is called the farthest distance method.
In addition, there is an intermediate distance method, which is a compromise between the nearest and the farthest distance methods. It combines G p and G q to form a new type G n , and then calculates the distance between other types G l and G n . This distance is called the intermediate distance, which is defined as If the intermediate distance method considers the number of samples in each fault, it is called the barycenter distance method, which is defined as This paper adopts the average distance method. The inter-class distance is the average distance between any two different samples in any two different fault types, which is defined as According to Formula (6), the inter-class distance between each fault and other faults is calculated, and then the sum of the inter-class distance between each fault type and other fault types is calculated to form the inter-class distance matrix S b .
Step 4. Calculate the ratio of the inter-class distance to the intra-class distance for each fault. The larger the inter-class distance between different fault types, while the smaller the intra-class distance of the same fault, the larger the ratio of them. This means that this fault has more obvious features and is easier to identify. The calculated distance ratios are ranked in descending order, and the priority diagnosis sequence of faults is obtained.
The operation process described above can be summarized in a flow chart, as shown in Figure 1.
According to Formula (6), the inter-class distance between each fault and other faults is calculated, and then the sum of the inter-class distance between each fault type and other fault types is calculated to form the inter-class distance matrix Sb.
Step 4. Calculate the ratio of the inter-class distance to the intra-class distance for each fault. The larger the inter-class distance between different fault types, while the smaller the intra-class distance of the same fault, the larger the ratio of them. This means that this fault has more obvious features and is easier to identify. The calculated distance ratios are ranked in descending order, and the priority diagnosis sequence of faults is obtained.
The operation process described above can be summarized in a flow chart, as shown in Figure 1.
Calculate the ratio of the inter-class distance to the intra-class distance

SSAE Network
The block diagram and unfolding structure of the auto-encoder network (AE) are shown in Figure 2.
Encoder Decoder

SSAE Network
The block diagram and unfolding structure of the auto-encoder network (AE) are shown in Figure 2. According to Formula (6), the inter-class distance between each fault and other faults is calculated, and then the sum of the inter-class distance between each fault type and other fault types is calculated to form the inter-class distance matrix Sb.
Step 4. Calculate the ratio of the inter-class distance to the intra-class distance for each fault. The larger the inter-class distance between different fault types, while the smaller the intra-class distance of the same fault, the larger the ratio of them. This means that this fault has more obvious features and is easier to identify. The calculated distance ratios are ranked in descending order, and the priority diagnosis sequence of faults is obtained.
The operation process described above can be summarized in a flow chart, as shown in Figure 1.
Calculate the ratio of the inter-class distance to the intra-class distance

T-SNE dimension reduction Reduce dimension by T-SNE
Calculate the intraclass distance of each type Calculate the interclass distance between different types

SSAE Network
The block diagram and unfolding structure of the auto-encoder network (AE) are shown in Figure 2.
Encoder Decoder  AE consists of an encoder and a decoder. , , a hidden layer, and output layer, respectively. Its work forms the input vector into the coding vector , a the output vector .
is also known as the recon mation form is as follows: AE consists of an encoder and a decoder. x, y, and x correspond to the input layer, hidden layer, and output layer, respectively. Its working process is that the encoder transforms the input vector x into the coding vector y, and then the decoder converts y into Sensors 2023, 23, 2320 5 of 17 the output vector x. x is also known as the reconstruction vector of x. The transformation form is as follows: where f and g are the activation functions for the encoding and decoding processes, respectively. AE usually selects the sigmoid activation function, whose expression is AE does not focus on the network's output, but on the coding, i.e., the mapping from input to output. The coding vector y is a mapping of the input vector x. The proximity between the output vector x and the input vector x is calculated to measure the quality of the AE network construction. The mean square error is used as the loss function L, which is also called the reconstruction error: where (W, b) is the parameter set of the network, W is the weight matrix, b is the offset vector, and d is the number of samples. The loss function is minimized by iterative optimization. At this time, it is considered that the network already contains most of the information about the input vectors, and the parameter set has obtained the best implicit relationship for the input vectors.
The sparse auto-encoder (SAE) is obtained by adding constraints to the AE. To avoid the overfitting of the network, a sparse penalty term is added to the original loss function. The Kullback-Leibler divergence is generally selected by SAE as the sparse penalty term of the network, and the improved loss function J is where α is the coefficient of the sparse penalty term, ρ is the sparse parameter, and ρ m is the average activation of the m-th node of the hidden layer. Ordinary SAE only has three layers and thus has difficulty learning all the interior features of the input vector and obtaining the deep hidden relationship of the data. Therefore, BENGIO Y et al. proposed an SAE network [27] which forms a stacked SAE network (SSAE) by stacking the shallow SAEs. In SSAE, each SAE is trained separately to obtain the parameters of each layer of the network, and the hidden layer of the lower SAE is used as the input layer of the higher SAE. Due to the increased network depth, SSAE is prone to overfitting, so it is necessary to add a regularization term to the improved loss function J. The new loss function J SSAE is expressed as where λ is the coefficient of the control regularization term, n l is the total number of network layers, s l is the number of nodes in layer l, and W (l) ji is the network parameter matrix of layer l. The coding process can be expressed as Sensors 2023, 23, 2320 6 of 17 Assume there are n coding layers in the coding process. The decoding process is then expressed as

XGBoost Algorithm
XGBoost is an algorithm based on a decision tree proposed by Dr. Chen at the University of Washington [28]. The objective function is defined as where y i is the actual value,ŷ (t−1) i is the predicted value for round t − 1, f t x i is the score function of samples in round t, and the final predicted value is the sum of them. Ω( f k ) represents the complex function of the tree; the smaller its value, the lower the tree's complexity and the stronger the generalization ability.
where T is the number of leaf nodes, ω represents the value or class of the node, and λ and γ are scale factors. ω 2 represents L 2 regularization of ω. Next, the loss function J SSAE described by Equation (19) is expanded by the secondorder Taylor expansion, and then the first-and second-order derivatives are obtained. Finally, the objective function Obj (t) can be obtained after sorting: where G j and H j are the sums of the first-and second-order derivatives, respectively.

Fault Diagnosis Process
First, the PE method is adopted to prioritize the fault diagnosis sequence, and then the SSAE-XGBoost model is used to extract fault features and identify fault types. The whole fault diagnosis process is shown in Figure 3.
As can be seen from Figure 3, the detailed diagnosis steps are as follows: Step 1. Divide the vibration signal of the rolling bearing into a training set and a test set.
Step 2. Determine the priority diagnosis sequence of faults according to the PE method, and assume the priority diagnosis sequence is X 1 > X 2 > X 3 . . . > X n .
Step 3. Train the SSAE-XGBoost model according to the priority diagnosis sequence of faults above. The detailed training steps are as follows: First, all the data in the training set are used for model training to diagnose the fault X 1 with the highest diagnostic priority and obtain the first diagnosis model. Then, the samples of X 1 fault are eliminated from the training set. Next, the remaining samples are used for model training to diagnose the fault X 2 and obtain the second diagnosis model. Then, the samples of X 2 fault are eliminated from the training set . . . Repeat the above steps until all the trained SSAE-XGBoost models of fault diagnosis are obtained.
Step 4. Verify the PE-SSAE-XGBoost method by the test set. Perform the fault diagnosis in the priority diagnosis sequence until all known faults are diagnosed. If there are remaining samples in the test set, they are considered as samples of unknown new faults.
where and are the sums of the first-and second-order derivatives, respectively.

Fault Diagnosis Process
First, the PE method is adopted to prioritize the fault diagnosis sequence, and then the SSAE-XGBoost model is used to extract fault features and identify fault types. The whole fault diagnosis process is shown in Figure 3.

Diagnosis sequence
Train SSAE-XGBoost  As can be seen from Figure 3, the detailed diagnosis steps are as follows: Step 1. Divide the vibration signal of the rolling bearing into a training set and a test set.
Step 2. Determine the priority diagnosis sequence of faults according to the PE method, and assume the priority diagnosis sequence is X1 > X2 > X3…>Xn.
Step 3. Train the SSAE-XGBoost model according to the priority diagnosis sequence of faults above. The detailed training steps are as follows: First, all the data in the training set are used for model training to diagnose the fault X1 with the highest diagnostic priority and obtain the first diagnosis model. Then, the samples of X1 fault are eliminated from the training set. Next, the remaining samples are used for model training to diagnose the fault X2 and obtain the second diagnosis model. Then, the samples of X2 fault are eliminated from the training set…Repeat the above steps until all the trained SSAE-XGBoost models of fault diagnosis are obtained.
Step 4. Verify the PE-SSAE-XGBoost method by the test set. Perform the fault diagnosis in the priority diagnosis sequence until all known faults are diagnosed. If there are remaining samples in the test set, they are considered as samples of unknown new faults.

Experimental Data
In this study, the experimental data set of rolling bearings was from the Electronic Engineering Laboratory of Case Western Reserve University (CWRU). The experimental platform is shown in Figure 4.  The data selected for this study were FE and DE data with 3 hp, a samp of 12 kHz, and a rotating speed of 1730 rpm. They included the normal  Table 1. A total of 70% were used as the training set and the remaining 30% were used as the test set. The computing platform is described as follows: the software used was PyCharm development software based on the Python environment. Main configuration parameters of the PC were CPU (Intel Core i7-8750H, Santa Clara, CA, USA), Graphics card (NVIDIA GeForce RTX 2060, Santa Clara, CA, USA), and Memory (16 GB).

Priority Diagnosis Sequence
According to the PE method, the intra-class distance S w and inter-class distance S b of each fault state described in Table 1 are calculated first. The calculated S w is illustrated in Figure 5, where fault states 1 (DE) and 1 (FE) represent the IRFs with damage diameter 0.01778 mm labeled 1 in Table 1, which are collected on DE and FE, respectively. The meanings of other fault states in Figure 5, and the following figures are similar. As shown in Figure 5, the S w of 8 (DE), i.e., the intra-class distance of BAF with the damage diameter 0.3556 mm at the drive end, is the farthest, and the S w of 2 (FE), i.e., the intra-class distance of IRF with the damage diameter 0.3556 mm at the fan end, is the closest. The calculated S b between different fault states is displayed in Figure 6.  Table 1, which are collected on DE and FE, respectively. Th meanings of other fault states in Figure 5, and the following figures are similar. As show in Figure 5, the Sw of 8 (DE), i.e., the intra-class distance of BAF with the damage diamete 0.3556 mm at the drive end, is the farthest, and the Sw of 2 (FE), i.e., the intra-class distanc of IRF with the damage diameter 0.3556 mm at the fan end, is the closest. The calculated Sb between different fault states is displayed in Figure 6. Next, according to Figure 6, the sum of the inter-class distance Sb between d fault states for each fault state is calculated, as shown in Figure 7. It can be seen inter-class distances of 1 (FE) and 2 (DE) are the farthest and closest, respectively. Next, according to Figure 6, the sum of the inter-class distance S b between different fault states for each fault state is calculated, as shown in Figure 7. It can be seen that the inter-class distances of 1 (FE) and 2 (DE) are the farthest and closest, respectively. Then, according to the calculated Sw and Sb, as shown in Figures 5 and 7, the ratios of Sb to Sw for each fault state can be obtained, which are presented in Figure 8. As we can see from the ratios, the priority diagnosis sequence is 1 (FE) >2 (FE) >4 (DE) >…>8 (DE). If the fault diagnosis is carried out according to the above priority sequence, the maximum diagnosis time is up to 71.58 s, which does not satisfy the requirement of fast and real-time fault diagnosis. Therefore, we consider all samples of the same fault type (IRF, BAF, or ORF) with different diameters as one data set for diagnosis, which can reduce the diagnosis time to 36.59 s. The average ratios of Sb to Sw for each fault type are calculated and summarized in Table 2.  Then, according to the calculated S w and S b, as shown in Figures 5 and 7, the ratios of S b to S w for each fault state can be obtained, which are presented in Figure 8. As we can see from the ratios, the priority diagnosis sequence is 1 (FE) >2 (FE) >4 (DE) > . . . >8 (DE). Then, according to the calculated Sw and Sb, as shown in Figures 5 and 7, the ratios of Sb to Sw for each fault state can be obtained, which are presented in Figure 8. As we can see from the ratios, the priority diagnosis sequence is 1 (FE) >2 (FE) >4 (DE) >…>8 (DE). If the fault diagnosis is carried out according to the above priority sequence, the maximum diagnosis time is up to 71.58 s, which does not satisfy the requirement of fast and real-time fault diagnosis. Therefore, we consider all samples of the same fault type (IRF, BAF, or ORF) with different diameters as one data set for diagnosis, which can reduce the diagnosis time to 36.59 s. The average ratios of Sb to Sw for each fault type are calculated and summarized in Table 2.  If the fault diagnosis is carried out according to the above priority sequence, the maximum diagnosis time is up to 71.58 s, which does not satisfy the requirement of fast and real-time fault diagnosis. Therefore, we consider all samples of the same fault type (IRF, BAF, or ORF) with different diameters as one data set for diagnosis, which can reduce the diagnosis time to 36.59 s. The average ratios of S b to S w for each fault type are calculated and summarized in Table 2. It can be seen from Table 2 that the order of the ratios of S b to S w for all fault types, i.e., the diagnosis sequence, is IRF > ORF > BAF. A graphical diagram of the diagnosis process is illustrated in Figure 9. Due to the feature of the normal state (label 0) being obviously different from those of fault states (label 1~9), the fault diagnosis model will first diagnose the normal state, and then eliminate its samples from the data set. Next, faults are diagnosed according to the priority sequence of IRF, ORF, and BAF, and then samples of each fault type are eliminated from the data set in turn. If there are remaining undiagnosed samples in the data set after the last BAF fault has been diagnosed, they are identified as samples of unknown faults.
Sensors 2023, 23, x FOR PEER REVIEW It can be seen from Table 2 that the order of the ratios of Sb to Sw for all faul i.e., the diagnosis sequence, is IRF > ORF > BAF. A graphical diagram of the dia process is illustrated in Figure 9. Due to the feature of the normal state (label 0 obviously different from those of fault states (label 1~9), the fault diagnosis mod first diagnose the normal state, and then eliminate its samples from the data set faults are diagnosed according to the priority sequence of IRF, ORF, and BAF, an samples of each fault type are eliminated from the data set in turn. If there are rem undiagnosed samples in the data set after the last BAF fault has been diagnosed, t identified as samples of unknown faults.

Diagnosis Results
In this part, the priority diagnostic sequences obtained by the PE method are p SSAE and XGBoost to train the diagnostic models. Then, the effectiveness of the pr method is verified by the test set.
The parameter setting of the SSAE-XGBoost model is listed in Table 3. The n of hidden layers of SSAE is set to 3, the network structure is set to 1024-512-256the Adam algorithm is selected to optimize the network, and the number of itera set to 60. The most essential parameters of the classifier XGBoost are the maximum of a tree (max depth), the minimum sum of instance weight needed in a child (mi weight), the number of decision trees (n estimators), and the learning rate. We choo the maximum depth, the minimum sum of instance weight needed in a child is s 80 decision trees are constructed, and the learning rate is set to 0.12. Table 3. Parameter setting of SSAE-XGBoost.
Number of hidden layers 3

Diagnosis Results
In this part, the priority diagnostic sequences obtained by the PE method are put into SSAE and XGBoost to train the diagnostic models. Then, the effectiveness of the proposed method is verified by the test set.
The parameter setting of the SSAE-XGBoost model is listed in Table 3. The number of hidden layers of SSAE is set to 3, the network structure is set to 1024-512-256-128-10, the Adam algorithm is selected to optimize the network, and the number of iterations is set to 60. The most essential parameters of the classifier XGBoost are the maximum depth of a tree (max depth), the minimum sum of instance weight needed in a child (min child weight), the number of decision trees (n estimators), and the learning rate. We choose 5 as the maximum depth, the minimum sum of instance weight needed in a child is set to 1, 80 decision trees are constructed, and the learning rate is set to 0.12. First, according to Figure 9, the normal state (label 0) in the test set is diagnosed, and all fault states are labeled as others. The obtained confusion matrix of the first diagnosis results is shown in Figure 10. It can be seen that the diagnosis accuracy reaches 98.2%; thus, the samples of normal state and fault states can be well distinguished. Then, the samples of the normal state (label 0) are eliminated from the test set. First, according to Figure 9, the normal state (label 0) in the test set is diagnosed, and all fault states are labeled as others. The obtained confusion matrix of the first diagnosis results is shown in Figure 10. It can be seen that the diagnosis accuracy reaches 98.2%; thus, the samples of normal state and fault states can be well distinguished. Then, the samples of the normal state (label 0) are eliminated from the test set. Second, the IRFs with three different fault diameters (labels 1~3) are diagnosed, and remaining fault states are labeled as others. The confusion matrix of the second diagnosis results is shown in Figure 11. As we can see, the fault diagnosis accuracy is 99.67%. Then, the samples of all diagnosed IRFs are eliminated from the test set.  Second, the IRFs with three different fault diameters (labels 1~3) are diagnosed, and remaining fault states are labeled as others. The confusion matrix of the second diagnosis results is shown in Figure 11. As we can see, the fault diagnosis accuracy is 99.67%. Then, the samples of all diagnosed IRFs are eliminated from the test set. First, according to Figure 9, the normal state (label 0) in the test set is diagnosed, and all fault states are labeled as others. The obtained confusion matrix of the first diagnosis results is shown in Figure 10. It can be seen that the diagnosis accuracy reaches 98.2%; thus, the samples of normal state and fault states can be well distinguished. Then, the samples of the normal state (label 0) are eliminated from the test set. Second, the IRFs with three different fault diameters (labels 1~3) are diagnosed, and remaining fault states are labeled as others. The confusion matrix of the second diagnosis results is shown in Figure 11. As we can see, the fault diagnosis accuracy is 99.67%. Then, the samples of all diagnosed IRFs are eliminated from the test set.  Third, the ORFs with three different fault diameters (labels 4~6) are diagnosed, and remaining fault states are labeled as others. The confusion matrix of the third diagnosis results is shown in Figure 12. It can be seen the accuracy of this fault diagnosis is 99.5%. Then, all diagnosed ORFs are eliminated from the test set. Finally, the BAFs with three different fault diameters (labels 7~9) are diagnosed. The confusion matrix of the fourth diagnosis results is shown in Figure 13. As we can see, the fault diagnosis accuracy is 99%. According to Figures 10-13, the accuracy of fault diagnosis of rolling bearings with the PE method can be calculated as high as 99.27%. Without the PE method, the confusion matrix of the diagnosis results is shown in Figure 14, and the fault diagnosis accuracy is only 96.3%. The PE method significantly improves the diagnosis accuracy. Finally, the BAFs with three different fault diameters (labels 7~9) are diagnosed. The confusion matrix of the fourth diagnosis results is shown in Figure 13. As we can see, the fault diagnosis accuracy is 99%. Finally, the BAFs with three different fault diameters (labels 7~9) are diagnosed. The confusion matrix of the fourth diagnosis results is shown in Figure 13. As we can see, the fault diagnosis accuracy is 99%. According to Figures 10-13, the accuracy of fault diagnosis of rolling bearings with the PE method can be calculated as high as 99.27%. Without the PE method, the confusion matrix of the diagnosis results is shown in Figure 14, and the fault diagnosis accuracy is only 96.3%. The PE method significantly improves the diagnosis accuracy. According to Figures 10-13, the accuracy of fault diagnosis of rolling bearings with the PE method can be calculated as high as 99.27%. Without the PE method, the confusion matrix of the diagnosis results is shown in Figure 14, and the fault diagnosis accuracy is only 96.3%. The PE method significantly improves the diagnosis accuracy. Furthermore, the proposed method is compared with classical fault diagnosis methods, such as CNN, SVM, and DBN, to further validate the advantages of this method. The comparison of diagnosis accuracy is listed in Table 4 (the SVM (94.78%) and DBN (93.19%) data are from previously published papers [29]). It can be seen from Table 4 that for CNN, SVM, and DBN, the PE method can improve the fault diagnosis accuracy. Especially for the proposed PE-SSAE-XGBoost, its fault diagnosis accuracy is significantly higher than other methods with or without PE. The identification ability of this method for unknown faults is verified next. We select the samples of the IRF and BAF at 3 HP, 1730 rpm, and 0.7112 mm fault diameter and IRF at 0 HP, 1797 rpm, and 0.7112 mm fault diameter as the data set of the unknown faults and add them into the test set. The two-dimensional visualization of the new test set is shown in Figure 15. Furthermore, the proposed method is compared with classical fault diagnosis methods, such as CNN, SVM, and DBN, to further validate the advantages of this method. The comparison of diagnosis accuracy is listed in Table 4 (the SVM (94.78%) and DBN (93.19%) data are from previously published papers [29]). It can be seen from Table 4 that for CNN, SVM, and DBN, the PE method can improve the fault diagnosis accuracy. Especially for the proposed PE-SSAE-XGBoost, its fault diagnosis accuracy is significantly higher than other methods with or without PE. The identification ability of this method for unknown faults is verified next. We select the samples of the IRF and BAF at 3 HP, 1730 rpm, and 0.7112 mm fault diameter and IRF at 0 HP, 1797 rpm, and 0.7112 mm fault diameter as the data set of the unknown faults and add them into the test set. The two-dimensional visualization of the new test set is shown in Figure 15. After adding samples of unknown faults to the test set, the comparison results of fault diagnosis accuracy between the PE method and other methods are listed in Table 5. After adding samples of unknown faults to the test set, the comparison results of fault diagnosis accuracy between the PE method and other methods are listed in Table 5. It can be seen from Table 5 that for the new test set with unknown faults, the PE method can also improve the fault diagnosis accuracy of SSAE-XGBoost, CNN, SVM, and DBN. Especially, the fault diagnosis accuracy of PE-SSAE-XGBoost is up to 92.34%, whereas that of SSAE-XGBoost without the PE method is only 86.96%. The reason for the low diagnosis accuracy of these methods without PE is that they cannot distinguish unknown faults and identify them as the most similar known faults, which reduces the fault diagnosis accuracy.

Conclusions
This paper proposed a PE method which combines with SSAE and XGBoost to improve the fault diagnosis accuracy of rolling bearings and the identification ability of unknown faults. The following conclusions can be drawn: (1) In terms of the improvement of the fault diagnosis accuracy, PE improves the fault diagnosis accuracy of all methods. The SSAE-XGBoost model combined with the PE method increases the fault diagnosis accuracy from 96.3% to 99.27%, which is also significantly higher than some classical algorithms with or without PE. (2) In the aspect of the identification of unknown faults, the fault data that do not appear in the training set are put into the test set. SSAE-XGBoost with PE can improve the accuracy of fault diagnosis from 86.96% to 92.34%, which is superior to other classical fault diagnosis methods with or without PE.
In conclusion, the proposed PE method has achieved good results in improving the accuracy of fault diagnosis and identifying unknown faults. It provides a new method for fault diagnosis, which is suitable for fault diagnosis of various mechanical equipment based on the data drive.  Data Availability Statement: The data sets used and/or analyzed during the current study are available from the corresponding author on reasonable request.